Studying the Learning Curves of a Statistical Dependency Parser for Four Languages

نویسنده

  • Atanas Chanev
چکیده

Multilingual dependency parsing is gaining popularity in recent years for several reasons. Dependency structures are more adequate for languages with freer word order than the traditional constituency notion. There is a growing availability of dependency treebanks for new languages. Broad coverage statistical dependency parsers are available and easily portable to new languages. Dependency parsing can provide useful contributions in areas such as information extraction, machine translation and question answering, among others. In addition, syntactic head-dependent pairs are a good interface between the traditional phrase structures and semantic theta roles. In this paper we present the learning curves of a statistical dependency parser for four languages: Arabic, Bulgarian, Italian and Slovene. We discuss issues that mostly concern the employed annotation scheme for each treebank with an emphasis on coordinated structures. We also investigate how these issues are related to the learning curve for each language. Preučevanje krivulje učenja statističnega odvisnostnega razčlenjevalnika za štiri jezike Večjezično odvisnostno skladenjsko razčlenjevanje postaja v zadnjih letih vse bolj privlačno zaradi vrste razlogov. Odvisnostne strukture so za jezike s prostejšim besednim redom primernejše kot pa tradicionalne, ki temeljijo na konstituentih, poleg tega pa je na voljo vse več odvisnostnih drevesnic za nove jezike. Statistični odvisnostni razčlenjevalniki s širokim pokritjem so dostopni in lahko prenosljivi na nove jezike. Odvisnostno razčlenjevanje je lahko koristen prispevek področjem, kot so luščenje podatkov, strojno prevajanje in sistemi za odgovarjanje na vprašanja. Poleg tega so skladenjski pari jedro-odvisnica dobri vmesniki med tradicionalno frazno strukturo in pomenskimi vlogami. V članku predstavimo krivulje učenja statističnega odvisnostnega razčlenjevalnika za štiri jezike: arabskega, bolgarskega, italijanskega in slovenskega. Razpravljamo o vprašanjih, ki se dotikajo predvsem uporabe označevalne sheme za vsako drevesnico s poudarkom na zgradbi priredij. Preučimo tudi, kako so ta vprašanja povezana s krivuljo učenja za vsakega od jezikov.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Statistical Dependency Parsing of Four Treebanks

Multilingual dependency parsing is gaining popularity in recent years for several reasons. Dependency structures are more adequate for languages with freer word order than the traditional constituency notion. There is a growing availability of dependency treebanks for new languages. Broad coverage statistical dependency parsers are available and easily portable to new languages. Dependency pars...

متن کامل

Feature Engineering in Persian Dependency Parser

Dependency parser is one of the most important fundamental tools in the natural language processing, which extracts structure of sentences and determines the relations between words based on the dependency grammar. The dependency parser is proper for free order languages, such as Persian. In this paper, data-driven dependency parser has been developed with the help of phrase-structure parser fo...

متن کامل

Studying impressive parameters on the performance of Persian probabilistic context free grammar parser

In linguistics, a tree bank is a parsed text corpus that annotates syntactic or semantic sentence structure. The exploitation of tree bank data has been important ever since the first large-scale tree bank, The Penn Treebank, was published. However, although originating in computational linguistics, the value of tree bank is becoming more widely appreciated in linguistics research as a whole. F...

متن کامل

Learning to Search for Dependencies

We create a transition-based dependency parser using a general purpose learning to search system. The result is a fast and accurate parser for many languages. Compared to other transition-based dependency parsing approaches, our parser provides similar statistical and computational performance with best-known approaches while avoiding various downsides including randomization, extra feature req...

متن کامل

ارائۀ راهکاری قاعده‌مند جهت تبدیل خودکار درخت تجزیۀ نحوی وابستگی به درخت تجزیۀ نحوی ساخت‌سازه‌ای برای زبان فارسی

In this paper, an automatic method in converting a dependency parse tree into an equivalent phrase structure one, is introduced for the Persian language. In first step, a rule-based algorithm was designed. Then, Persian specific dependency-to-phrase structure conversion rules merged to the algorithm. Subsequently, the Persian dependency treebank with about 30,000 sentences was used as an input ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006